We are using the Current Population Survey - Food Security Supplement Dec 2021 data provided by the US Census Bureau
The Dataset contains 507 variables and roughly 120,000 observations
Specific:- To study the specific pattern shown in the data that affects food security such as states, counties, income level, whether the family uses SNAP, race, immigrant status, work status, education level and many more demographic, socio-economic variables.
Measurable: Use EDA techniques to know how significantly different factors contribute to food insecurity.
Achievable: Can find variables which are significantly affecting food insecurity and can create models for ensuring food security in households.
Relevant: Food being the basic requirement of any human, this study can shed light on what the authorities and we ourselves can do in order to eradicate food insecurity.
Time-oriented: Data set for the month of December 2021 is considered for the study so that it can also show the effect of Covid-19 in food security.
Considering the Questions we are asking, we have decided to select just 11 factors to work on
A very significant limitation to our data is that we have trimmed off a lot of observations where either the interview was not taken or not completed. Ideally we should account for these observations somehow, but due to time constraints we aren’t doing that
## 'data.frame': 71472 obs. of 12 variables:
## $ Id : Factor w/ 27922 levels "5185410966","8178510165",..: 16600 9378 9378 8472 8472 7861 7861 19375 19375 24604 ...
## $ States : Factor w/ 51 levels "1","2","4","5",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ Family_Size : Factor w/ 14 levels "1","2","3","4",..: 1 2 2 2 2 2 2 2 2 1 ...
## $ Household_Income : Factor w/ 16 levels "1","2","3","4",..: 16 14 14 12 12 13 13 9 9 11 ...
## $ SNAP : Factor w/ 5 levels "-3","-2","-1",..: 3 3 3 5 5 3 3 5 5 3 ...
## $ Ethnicity : Factor w/ 24 levels "1","2","3","4",..: 1 1 1 1 1 1 1 2 2 1 ...
## $ Citizenship_status: Factor w/ 5 levels "1","2","3","4",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ Number_of_Jobs : Factor w/ 4 levels "-1","2","3","4": 1 1 1 1 1 1 1 1 1 1 ...
## $ Hours_on_Jobs : Factor w/ 88 levels "-4","-1","0",..: 67 43 2 43 43 62 43 2 2 2 ...
## $ Education_Level : Factor w/ 17 levels "-1","31","32",..: 14 15 1 14 14 10 10 7 10 5 ...
## $ FoodSecurity_score: Factor w/ 4 levels "1","2","3","4": 1 1 1 1 1 1 1 2 2 1 ...
## $ PRNMCHLD : Factor w/ 12 levels "0","1","2","3",..: 1 2 1 1 1 1 1 1 1 1 ...
Coming to our Response Variable, Food Security
High Food Security: No reported indications of food-access problems or limitations.
Marginal Food Security: One or two reported signs, usually anxiety over food availability or scarcity in the home. There is little to no evidence that diets or food intake have changed.
Low Food Security: One or two reported signs, usually indicating worry about food scarcity or insufficiency at home. Little to no evidence of dietary or food intake changes.
Very Low Food Security: Reports of numerous signs of altered eating habits and decreased food intake.
caption
caption
We are going to be Using Fisher’s Exact Test instead of Chi-square test because of the numerous levels with low frequency of observations
Our Null Hypothesis is that Ethnicity and Food Security Status are Independent of each other.
Taking our alpha to be 5%
##
## Fisher's Exact Test for Count Data with simulated p-value (based on
## 2000 replicates)
##
## data: FS_Subset$Ethnicity and FS_Subset$FoodSecurity_score
## p-value = 0.0004998
## alternative hypothesis: two.sided
caption
We are Chi-square test
Our Null Hypothesis is that Citizenship Status and Food Security Status are Independent of each other.
Taking our alpha to be 5%
##
## Pearson's Chi-squared test
##
## data: FS_Subset$Citizenship_status and FS_Subset$FoodSecurity_score
## X-squared = 437.62, df = 12, p-value < 2.2e-16
caption
We are Chi-square test
Our Null Hypothesis is that SNAP Status and Food Security Status are Independent of each other.
Taking our alpha to be 5%
##
## Pearson's Chi-squared test
##
## data: chi_test_SNAP
## X-squared = 764.1, df = 3, p-value < 2.2e-16
## Outcome + Outcome - Total Inc risk * Odds
## Exposed + 4471 2737 7208 62.0 1.63
## Exposed - 14258 4008 18266 78.1 3.56
## Total 18729 6745 25474 73.5 2.78
##
## Point estimates and 95% CIs:
## -------------------------------------------------------------------
## Inc risk ratio 0.79 (0.78, 0.81)
## Odds ratio 0.46 (0.43, 0.49)
## Attrib risk in the exposed * -16.03 (-17.30, -14.76)
## Attrib fraction in the exposed (%) -25.84 (-28.34, -23.40)
## Attrib risk in the population * -4.54 (-5.34, -3.73)
## Attrib fraction in the population (%) -6.17 (-6.68, -5.66)
## -------------------------------------------------------------------
## Uncorrected chi2 test that OR = 1: chi2(1) = 682.162 Pr>chi2 = <0.001
## Fisher exact test that OR = 1: Pr>chi2 = <0.001
## Wald confidence limits
## CI: confidence interval
## * Outcomes per 100 population units
## Id Number_of_Jobs Hours_on_Jobs Education_Level
## 1 404006407110031 -1 65 43
## 7 147240092351000 -1 40 44
## 8 147240092351000 -1 -1 -1
## 16 128450301231000 -1 40 43
## 17 128450301231000 -1 40 43
## 18 114580195861000 -1 60 39
## FoodSecurity_score
## 1 High Food Security
## 7 High Food Security
## 8 High Food Security
## 16 High Food Security
## 17 High Food Security
## 18 High Food Security
## 'data.frame': 71472 obs. of 5 variables:
## $ Id : Factor w/ 27922 levels "5185410966","8178510165",..: 16600 9378 9378 8472 8472 7861 7861 19375 19375 24604 ...
## $ Number_of_Jobs : Factor w/ 4 levels "-1","2","3","4": 1 1 1 1 1 1 1 1 1 1 ...
## $ Hours_on_Jobs : Factor w/ 88 levels "-4","-1","0",..: 67 43 2 43 43 62 43 2 2 2 ...
## $ Education_Level : Factor w/ 17 levels "-1","31","32",..: 14 15 1 14 14 10 10 7 10 5 ...
## $ FoodSecurity_score: Factor w/ 4 levels "High Food Security",..: 1 1 1 1 1 1 1 2 2 1 ...
## Not Applicable
## 12558
## Less Than 1st Grade
## 144
## 1st, 2nd, 3rd Or 4th Grade
## 251
## 5th Or 6th Grade
## 446
## 7th Or 8th Grade
## 912
## 9th Grade
## 1241
## 10th Grade
## 1538
## 11th Grade
## 1625
## 12th Grade No Diploma
## 910
## High School Grad-Diploma Or Equiv (Ged)
## 16004
## Some College But No Degree
## 9492
## Associate Degree-Occupational/Vocational
## 2454
## Associate Degree-Academic Program
## 3257
## Bachelors Degree
## 12871
## Masters Degree
## 5720
## Professional School Deg
## 858
## Doctorate Degree
## 1191
## Not Applicable 2 Jobs 3 Jobs 4 or more jobs
## 69584 1706 159 23
## 'data.frame': 71472 obs. of 5 variables:
## $ Id : Factor w/ 27922 levels "5185410966","8178510165",..: 16600 9378 9378 8472 8472 7861 7861 19375 19375 24604 ...
## $ Number_of_Jobs : Factor w/ 4 levels "Not Applicable",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ Hours_on_Jobs : num 67 43 2 43 43 62 43 2 2 2 ...
## $ Education_Level : Factor w/ 17 levels "Not Applicable",..: 14 15 1 14 14 10 10 7 10 5 ...
## $ FoodSecurity_score: Factor w/ 4 levels "High Food Security",..: 1 1 1 1 1 1 1 2 2 1 ...
Here * 1 High Food Security * 2 Marginal Food Security * 3 Low Food Security * 4 Very Low Food Security * -9 No Response
In the above graphs, people say that they have High Food Security irrespective of the number of jobs. But lets use Chi-Square test to see if they are really independent of each other
## 'data.frame': 1888 obs. of 5 variables:
## $ Id : Factor w/ 1734 levels "13041104291",..: 769 35 419 512 1326 1244 535 222 82 974 ...
## $ Number_of_Jobs : Factor w/ 3 levels "2 Jobs","3 Jobs",..: 1 1 1 1 1 2 1 1 1 2 ...
## $ Hours_on_Jobs : num 38 43 33 48 40 13 23 33 13 43 ...
## $ Education_Level : Factor w/ 15 levels "1st, 2nd, 3rd Or 4th Grade",..: 13 12 10 13 8 12 13 12 8 13 ...
## $ FoodSecurity_score: Factor w/ 4 levels "High Food Security",..: 1 1 1 1 1 1 1 1 1 4 ...
## 'data.frame': 1888 obs. of 5 variables:
## $ Id : Factor w/ 27922 levels "5185410966","8178510165",..: 12278 466 6883 8428 21354 20006 8726 3657 1472 15457 ...
## $ Number_of_Jobs : Factor w/ 4 levels "Not Applicable",..: 2 2 2 2 2 3 2 2 2 3 ...
## $ Hours_on_Jobs : num 38 43 33 48 40 13 23 33 13 43 ...
## $ Education_Level : Factor w/ 17 levels "Not Applicable",..: 15 14 12 15 10 14 15 14 10 15 ...
## $ FoodSecurity_score: Factor w/ 4 levels "High Food Security",..: 1 1 1 1 1 1 1 1 1 4 ...
| High Food Security | Marginal Food Security | Low Food Security | Very Low Food Security | |
|---|---|---|---|---|
| 2 Jobs | 1425 | 115 | 112 | 54 |
| 3 Jobs | 136 | 7 | 7 | 9 |
| 4 or more jobs | 19 | 2 | 0 | 2 |
##
## Pearson's Chi-squared test
##
## data: contable_number_of_jobs
## X-squared = 8.4874, df = 6, p-value = 0.2045
The result gave warnings as the estimated value for some cells are very low. From the test, we see that the P-value for the Chi-square test is 0.3871 which is greater than the default value 0.05. Hence we accept the null hypothesis and hence, Number of Jobs doesn’t significantly affect the Food Security.
###Education Level
## 'data.frame': 58914 obs. of 5 variables:
## $ Id : Factor w/ 27922 levels "5185410966","8178510165",..: 16600 9378 8472 8472 7861 7861 19375 19375 24604 32 ...
## $ Number_of_Jobs : Factor w/ 4 levels "Not Applicable",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ Hours_on_Jobs : num 67 43 43 43 62 43 2 2 2 43 ...
## $ Education_Level : Factor w/ 16 levels "Less Than 1st Grade",..: 13 14 13 13 9 9 6 9 4 13 ...
## $ FoodSecurity_score: Factor w/ 4 levels "High Food Security",..: 1 1 1 1 1 1 2 2 1 1 ...
## Var1 Freq
## 1 Less Than 1st Grade 144
## 2 1st, 2nd, 3rd Or 4th Grade 251
## 3 5th Or 6th Grade 446
## 4 7th Or 8th Grade 912
## 5 9th Grade 1241
## 6 10th Grade 1538
## 7 11th Grade 1625
## 8 12th Grade No Diploma 910
## 9 High School Grad-Diploma Or Equiv (Ged) 16004
## 10 Some College But No Degree 9492
## 11 Associate Degree-Occupational/Vocational 2454
## 12 Associate Degree-Academic Program 3257
## 13 Bachelors Degree 12871
## 14 Masters Degree 5720
## 15 Professional School Deg 858
## 16 Doctorate Degree 1191
The response is understood as follows:
31 LESS THAN 1ST GRADE 32 1ST, 2ND, 3RD OR 4TH GRADE 33 5TH OR 6TH GRADE 34 7TH OR 8TH GRADE 35 9TH GRADE 36 10TH GRADE 37 11TH GRADE 38 12TH GRADE NO DIPLOMA 39 HIGH SCHOOL GRAD-DIPLOMA OR EQUIV (GED) 40 SOME COLLEGE BUT NO DEGREE 41 ASSOCIATE DEGREE-OCCUPATIONAL/VOCATIONAL 42 ASSOCIATE DEGREE-ACADEMIC PROGRAM 43 BACHELOR’S DEGREE (EX: BA, AB, BS) 44 MASTER’S DEGREE (EX: MA, MS, MEng, MEd, MSW) 45 PROFESSIONAL SCHOOL DEG (EX: MD, DDS, DVM) 46 DOCTORATE DEGREE (EX: PhD, EdD)
## Less Than 1st Grade
## 144
## 1st, 2nd, 3rd Or 4th Grade
## 251
## 5th Or 6th Grade
## 446
## 7th Or 8th Grade
## 912
## 9th Grade
## 1241
## 10th Grade
## 1538
## 11th Grade
## 1625
## 12th Grade No Diploma
## 910
## High School Grad-Diploma Or Equiv (Ged)
## 16004
## Some College But No Degree
## 9492
## Associate Degree-Occupational/Vocational
## 2454
## Associate Degree-Academic Program
## 3257
## Bachelors Degree
## 12871
## Masters Degree
## 5720
## Professional School Deg
## 858
## Doctorate Degree
## 1191
| High Food Security | Marginal Food Security | Low Food Security | Very Low Food Security | |
|---|---|---|---|---|
| Not Applicable | 9703 | 1238 | 1199 | 418 |
| Less Than 1st Grade | 95 | 20 | 17 | 12 |
| 1st, 2nd, 3rd Or 4th Grade | 158 | 36 | 41 | 16 |
| 5th Or 6th Grade | 276 | 55 | 82 | 33 |
| 7th Or 8th Grade | 621 | 111 | 129 | 51 |
| 9th Grade | 894 | 141 | 134 | 72 |
| 10th Grade | 1094 | 190 | 167 | 87 |
| 11th Grade | 1168 | 182 | 178 | 97 |
| 12th Grade No Diploma | 639 | 106 | 120 | 45 |
| High School Grad-Diploma Or Equiv (Ged) | 12435 | 1575 | 1328 | 666 |
| Some College But No Degree | 7751 | 744 | 626 | 371 |
| Associate Degree-Occupational/Vocational | 2033 | 182 | 160 | 79 |
| Associate Degree-Academic Program | 2773 | 215 | 162 | 107 |
| Bachelors Degree | 11860 | 477 | 351 | 183 |
| Masters Degree | 5416 | 140 | 108 | 56 |
| Professional School Deg | 828 | 17 | 7 | 6 |
| Doctorate Degree | 1157 | 13 | 11 | 10 |
##
## Pearson's Chi-squared test
##
## data: contable_edu
## X-squared = 3136.8, df = 48, p-value < 2.2e-16
Education_Level is having a significant effect on the
Food Security of People
##Hours_on_Jobs
## 'data.frame': 71472 obs. of 5 variables:
## $ Id : Factor w/ 27922 levels "5185410966","8178510165",..: 16600 9378 9378 8472 8472 7861 7861 19375 19375 24604 ...
## $ Number_of_Jobs : Factor w/ 4 levels "Not Applicable",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ Hours_on_Jobs : num 67 43 2 43 43 62 43 2 2 2 ...
## $ Education_Level : Factor w/ 17 levels "Not Applicable",..: 14 15 1 14 14 10 10 7 10 5 ...
## $ FoodSecurity_score: Factor w/ 4 levels "High Food Security",..: 1 1 1 1 1 1 1 2 2 1 ...
## Var1 Freq
## 1 1 1921
## 2 2 37708
## 3 3 36
## 4 4 23
## 5 5 36
## 6 6 44
## 7 7 81
## 8 8 100
## 9 9 74
## 10 10 23
## 11 11 166
## 12 12 22
## 13 13 369
## 14 14 15
## 15 15 169
## 16 16 12
## 17 17 34
## 18 18 374
## 19 19 200
## 20 20 20
## 21 21 92
## 22 22 9
## 23 23 1209
## 24 24 19
## 25 25 38
## 26 26 19
## 27 27 302
## 28 28 580
## 29 29 37
## 30 30 33
## 31 31 79
## 32 32 21
## 33 33 1023
## 34 34 11
## 35 35 394
## 36 36 27
## 37 37 39
## 38 38 907
## 39 39 426
## 40 40 141
## 41 41 266
## 42 42 32
## 43 43 18496
## 44 44 11
## 45 45 159
## 46 46 64
## 47 47 82
## 48 48 1258
## 49 49 25
## 50 50 30
## 51 51 213
## 52 52 15
## 53 53 2050
## 54 54 1
## 55 55 28
## 56 56 13
## 57 57 13
## 58 58 427
## 59 59 33
## 60 60 5
## 61 61 22
## 62 62 906
## 63 63 1
## 64 64 1
## 65 65 1
## 66 66 5
## 67 67 85
## 68 68 7
## 69 69 6
## 70 70 3
## 71 71 1
## 72 72 146
## 73 73 47
## 74 74 1
## 75 75 2
## 76 76 22
## 77 77 2
## 78 78 82
## 79 79 24
## 80 80 2
## 81 81 1
## 82 82 1
## 83 83 1
## 84 84 1
## 85 85 16
## 86 86 4
## 87 87 5
## 88 88 23
## [1] 0
There are 0 whose working hours vary.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1.00 2.00 2.00 19.58 43.00 88.00
## [1] 2
## Df Sum Sq Mean Sq F value Pr(>F)
## FoodSecurity_score 3 279623 93208 214.6 <2e-16 ***
## Residuals 71468 31045414 434
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## Tukey multiple comparisons of means
## 95% family-wise confidence level
##
## Fit: aov(formula = Hours_on_Jobs ~ FoodSecurity_score, data = food_hoj)
##
## $FoodSecurity_score
## diff lwr upr
## Marginal Food Security-High Food Security -4.2140290 -4.972646 -3.4554123
## Low Food Security-High Food Security -5.6863555 -6.488530 -4.8841810
## Very Low Food Security-High Food Security -6.0702228 -7.206149 -4.9342962
## Low Food Security-Marginal Food Security -1.4723264 -2.531399 -0.4132541
## Very Low Food Security-Marginal Food Security -1.8561937 -3.186036 -0.5263518
## Very Low Food Security-Low Food Security -0.3838673 -1.739029 0.9712947
## p adj
## Marginal Food Security-High Food Security 0.0000000
## Low Food Security-High Food Security 0.0000000
## Very Low Food Security-High Food Security 0.0000000
## Low Food Security-Marginal Food Security 0.0020125
## Very Low Food Security-Marginal Food Security 0.0019070
## Very Low Food Security-Low Food Security 0.8860333
2 - 1, 3 -1, 4 -1, 3 -2, 4 -2 have significant difference in there mean.
8657d5728a00f036f19d3ba04f8e0d67a4b3431f
-States, Family size, and Household Income
## AL AK AZ AR CA CO CT DE DC FL GA HI ID IL IN IA
## 1207 970 1080 1258 6975 764 593 832 1208 2738 1421 1125 1304 2052 1265 884
## KS KY LA ME MD MA MI MN MS MO MT NE NV NH NJ NM
## 952 808 1606 564 963 1352 1762 999 1505 1099 1255 791 1007 994 1397 1242
## NY NC ND OH OK OR PA RI SC SD TN TX UT VT VA WA
## 2580 1503 1143 1681 985 1247 1928 647 1028 840 1452 3946 1306 1030 1301 1404
## WV WI WY
## 1214 1092 1173
California has the highest number of respondents (6975), whereas Maine has the smallest number of respondents (564). In order to compare, I’m choosing states which has similar number of respondents. Alabama 1207 and Washington DC 1207, Florida 2738 and New York 2580, IL 2052 and PA 1928.
<<<<<<< HEAD
## Id States Family_Size Household_Income FoodSecurity_score
## 1 404006407110031 AL 1 16 High Food Security
## 7 147240092351000 AL 2 14 High Food Security
## 8 147240092351000 AL 2 14 High Food Security
## 16 128450301231000 AL 2 12 High Food Security
## 17 128450301231000 AL 2 12 High Food Security
## 18 114580195861000 AL 2 13 High Food Security
## PRNMCHLD
## 1 0
## 7 1
## 8 0
## 16 0
## 17 0
## 18 0
Reference to household income: 1 LESS THAN $5,000
2 5,000 TO 7,499
3 7,500 TO 9,999
4 10,000 TO 12,499
5 12,500 TO 14,999
6 15,000 TO 19,999
7 20,000 TO 24,999
8 25,000 TO 29,999
9 30,000 TO 34,999
10 35,000 TO 39,999
11 40,000 TO 49,999
12 50,000 TO 59,999
13 60,000 TO 74,999
14 75,000 TO 99,999
15 100,000 TO 149,999
16 150,000 OR MORE
##
## Pearson's Chi-squared test
##
## data: income_t
## X-squared = 9512.9, df = 45, p-value < 2.2e-16
We can say that Household income is affecting food insecurity.
## Warning in chisq.test(family_t): Chi-squared approximation may be incorrect
##
## Pearson's Chi-squared test
##
## data: family_t
## X-squared = 1691.7, df = 39, p-value < 2.2e-16
We can say that Family size is affecting food insecurity.
As you can see from the boxplot, whenever family size bigger (more than
6 people), food insecurity is high. Also, household income has direct
effect on food security. When household income is higher than 40k, food
security score is low.
## 1 2 3 4 5 6 7 8 9 10 11 12 13
## 9210 21816 12753 13724 7810 3660 1344 560 288 120 88 72 13
## 14
## 14
<<<<<<< HEAD
======= >>>>>>>
a668936d7248af43706a630f64deb152a2ec6084
Interesting thing from this graph is that when family size ig bigger, household income is high and that family has high food security. When Family size and Household income are separate, they have significant relatioship with food security. However, when they are combined together, the result is different. For further analysis, we need to consider age and empoyment type of the family members.